Segmentation of Speech and Humming in Vocal Input
نویسندگان
چکیده
Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. “volume up, mmm”) it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.
منابع مشابه
A Music Retrieval System with a Seamless Query Interface by Humming or Song Title
We propose a music retrieval system that enables a user to retrieve a song by two different methods: by singing its melody or by saying its title. To allow the user to use those methods seamlessly without changing a voice input mode, a method of automatically discriminating between singing and speaking voices is indispensable. We therefore designed an automatic vocal style discriminator and bui...
متن کاملEffective Segmentation based on Vocal Effort Change Point Detection
Non-neutral speech data has a strong negative impact on speech processing systems such as Automatic Speech Recognition (ASR) or speaker ID systems [1]. It is therefore necessary to detect and segment non-neutral speech data before further processing steps. Alternatively, the detection and segmentation of non-neutral speech segments from an input speech stream can be used in speech analysis and ...
متن کاملA Romanian Syllable-Based Text-To-Speech System
In this article we present the way we have built a syllable-based TTS system for Romanian. The system contains: a text analyser capable to separate syllables from input text and detect accentuation, a vocal database with recorded syllables, a unit matching module and a synthesizer. The analyser was built using a LEX generator by mean of two sets of phonetic rules. Vocal database was generated t...
متن کاملMirex2008: Query by Humming/singing System
This extended abstract describes my submission to the QBSH (Query by Singing/Humming) task of MIREX (Music Information Retrieval Evaluation eXchange) 2008. The system takes advantage of note-based and frame-based matching methods to improve the accuracy of the Query by Singing/Humming system. First, Earth Mover’s Distance (EMD), which is note-based and much faster, is adopted to eliminate most ...
متن کاملPatient-Based Assessment of Effectiveness of Voice Therapy in Vocal Mass Lesions with Secondary Muscle Tension Dysphonia
Introduction: Use of patient-based voice assessment scales is an appropriate method that is frequently used to demonstrate effectiveness of voice therapy. This study was aimed at determining the effectiveness ofvoice therapy among patients with secondary muscle tension dysphonia (MTD) and vocal mass lesions. Materials and Methods: The study design was prospective, with within-participant repe...
متن کامل